House Price Prediction using Regression Techniques

Abstract :

Importing Libraries

Import Data

Output Label Analysis

Data Cleaning

null value analysis

There are total 19 columns where missing values are present, out of which 5 columns are almost 50% or more than missing values.

null value imputation and removal

duplicate data count

Outlier Detection

there are outliers present in the above code as we can see 4 points are different from the usual cluster.

Box Plot Analysis for categorical data

OverallQual

There is an obvious correlation of saleprice with overall quality of the house. the best the quality the best the saleprice.

OverallCond

Overall condition has less impact to the sales price.

YearBuilt

there is a small pattern can be observed here. The newer the house built, the probability of sale price is also high.

The neighhourhood present in the dataset has also showing a mild trend. Neighbourhood with "NoRidge", "StoneBr", and "NridgHt" has highest sale price of the house.

It seems NAmes has the highest number of houses present in it, where as Blueste, and Veenker has less data.

saleCondition

very less percentage of houses were sold with an alloca and adjland way. partial and Abnormal way of sale consists of 12%.

numerical data analysis

non of the features following any kind of normality in the dataset present.

Correlation

the above code is not much clear from the values point of view.
the above features are the most common multicollinearity.
The big one is GarageCars and GarageArea. This is obvious that if the garage space is high then the car space will be good.
overQuality and GrLivArea are the highly correlated features with the output class.

Feature Engineering

Scaling - Encoding - Transforming

Splitting of data

Linear Regression

RandomForest Regressor

XGBoost